6 research outputs found
About latent roles in forecasting players in team sports
Forecasting players in sports has grown in popularity due to the potential
for a tactical advantage and the applicability of such research to multi-agent
interaction systems. Team sports contain a significant social component that
influences interactions between teammates and opponents. However, it still
needs to be fully exploited. In this work, we hypothesize that each participant
has a specific function in each action and that role-based interaction is
critical for predicting players' future moves. We create RolFor, a novel
end-to-end model for Role-based Forecasting. RolFor uses a new module we
developed called Ordering Neural Networks (OrderNN) to permute the order of the
players such that each player is assigned to a latent role. The latent role is
then modeled with a RoleGCN. Thanks to its graph representation, it provides a
fully learnable adjacency matrix that captures the relationships between roles
and is subsequently used to forecast the players' future trajectories.
Extensive experiments on a challenging NBA basketball dataset back up the
importance of roles and justify our goal of modeling them using optimizable
models. When an oracle provides roles, the proposed RolFor compares favorably
to the current state-of-the-art (it ranks first in terms of ADE and second in
terms of FDE errors). However, training the end-to-end RolFor incurs the issues
of differentiability of permutation methods, which we experimentally review.
Finally, this work restates differentiable ranking as a difficult open problem
and its great potential in conjunction with graph-based interaction models.
Project is available at: https://www.pinlab.org/aboutlatentrolesComment: AI4ABM@ICLR2023 Worksho
Staged Contact-Aware Global Human Motion Forecasting
Scene-aware global human motion forecasting is critical for manifold
applications, including virtual reality, robotics, and sports. The task
combines human trajectory and pose forecasting within the provided scene
context, which represents a significant challenge.
So far, only Mao et al. NeurIPS'22 have addressed scene-aware global motion,
cascading the prediction of future scene contact points and the global motion
estimation. They perform the latter as the end-to-end forecasting of future
trajectories and poses. However, end-to-end contrasts with the coarse-to-fine
nature of the task and it results in lower performance, as we demonstrate here
empirically.
We propose a STAGed contact-aware global human motion forecasting STAG, a
novel three-stage pipeline for predicting global human motion in a 3D
environment. We first consider the scene and the respective human interaction
as contact points. Secondly, we model the human trajectory forecasting within
the scene, predicting the coarse motion of the human body as a whole. The third
and last stage matches a plausible fine human joint motion to complement the
trajectory considering the estimated contacts.
Compared to the state-of-the-art (SoA), STAG achieves a 1.8% and 16.2%
overall improvement in pose and trajectory prediction, respectively, on the
scene-aware GTA-IM dataset. A comprehensive ablation study confirms the
advantages of staged modeling over end-to-end approaches. Furthermore, we
establish the significance of a newly proposed temporal counter called the
"time-to-go", which tells how long it is before reaching scene contact and
endpoints. Notably, STAG showcases its ability to generalize to datasets
lacking a scene and achieves a new state-of-the-art performance on CMU-Mocap,
without leveraging any social cues. Our code is released at:
https://github.com/L-Scofano/STAGComment: 15 pages, 7 figures, BMVC23 ora
Contracting Skeletal Kinematic Embeddings for Anomaly Detection
Detecting the anomaly of human behavior is paramount to timely recognizing
endangering situations, such as street fights or elderly falls. However,
anomaly detection is complex, since anomalous events are rare and because it is
an open set recognition task, i.e., what is anomalous at inference has not been
observed at training. We propose COSKAD, a novel model which encodes skeletal
human motion by an efficient graph convolutional network and learns to COntract
SKeletal kinematic embeddings onto a latent hypersphere of minimum volume for
Anomaly Detection. We propose and analyze three latent space designs for
COSKAD: the commonly-adopted Euclidean, and the new spherical-radial and
hyperbolic volumes. All three variants outperform the state-of-the-art,
including video-based techniques, on the ShangaiTechCampus, the Avenue, and on
the most recent UBnormal dataset, for which we contribute novel skeleton
annotations and the selection of human-related videos. The source code and
dataset will be released upon acceptance.Comment: Submitted to Patter Recognition Journa
Space-time-separable graph convolutional network for pose forecasting
Human pose forecasting is a complex structured-data sequence-modelling task, which has received increasing attention, also due to numerous potential applications. Research has mainly addressed the temporal dimension as time series and the interaction of human body joints with a kinematic tree or by a graph. This has decoupled the two aspects and leveraged progress from the relevant fields, but it has also limited the understanding of the complex structural joint spatio-temporal dynamics of the human pose. Here we propose a novel Space-Time-Separable Graph Convolutional Network (STS-GCN) for pose forecasting. For the first time, STS-GCN models the human pose dynamics only with a graph convolutional network (GCN), including the temporal evolution and the spatial joint interaction within a single-graph framework, which allows the cross-talk of motion and spatial correlations. Concurrently, STS-GCN is the first space-time-separable GCN: the space-time graph connectivity is factored into space and time affinity matrices, which bottlenecks the space-time cross-talk, while enabling full joint-joint and time-time correlations. Both affinity matrices are learnt end-to-end, which results in connections substantially deviating from the standard kinematic tree and the linear-time time series. In experimental evaluation on three complex, recent and large-scale benchmarks, Human3.6M [Ionescu et al. TPAMI'14], AMASS [Mahmood et al. ICCV'19] and 3DPW [Von Marcard et al. ECCV'18], STS-GCN outperforms the state-of-the-art, surpassing the current best technique [Mao et al. ECCV'20] by over 32% in average at the most difficult long-term predictions, while only requiring 1.7% of its parameters. We explain the results qualitatively and illustrate the graph interactions by the factored joint-joint and time-time learnt graph connections
Pose Forecasting in Industrial Human-Robot Collaboration
Pushing back the frontiers of collaborative robots in industrial environments, we propose a new Separable-Sparse Graph Convolutional Network (SeS-GCN) for pose forecasting. For the first time, SeS-GCN bottlenecks the interaction of the spatial, temporal and channel-wise dimensions in GCNs, and it learns sparse adjacency matrices by a teacher-student framework. Compared to the state-of-the-art, it only uses 1.72% of the parameters and it is ∼4 times faster, while still performing comparably in forecasting accuracy on Human3.6M at 1 s in the future, which enables cobots to be aware of human operators. As a second contribution, we present a new benchmark of Cobots and Humans in Industrial COllaboration (CHICO ). CHICO includes multi-view videos, 3D poses and trajectories of 20 human operators and cobots, engaging in 7 realistic industrial actions. Additionally, it reports 226 genuine collisions, taking place during the human-cobot interaction. We test SeS-GCN on CHICO for two important perception tasks in robotics: human pose forecasting, where it reaches an average error of 85.3 mm (MPJPE) at 1 sec in the future with a run time of 2.3 ms, and collision detection, by comparing the forecasted human motion with the known cobot motion, obtaining an F1-score of 0.64